dream field
OpenAI releases Point-E, an AI that generates 3D models • TechCrunch
The next breakthrough to take the AI world by storm might be 3D model generators. This week, OpenAI open sourced Point-E, a machine learning system that creates a 3D object given a text prompt. According to a paper published alongside the code base, Point-E can produce 3D models in one to two minutes on a single Nvidia V100 GPU. Point-E doesn't create 3D objects in the traditional sense. Rather, it generates point clouds, or discrete sets of data points in space that represent a 3D shape -- hence the cheeky abbreviation.
Google Creates AI That Turns Text Into 3D Objects
DreamFusion, Google's next-gen, AI-powered text-to-3D-image generator, is here. A proof-of-concept paper is here, at least. DreamFusion is an evolution of Dream Fields, a text-to-3D-image generator revealed by Google back in 2021. And like Dream Fields, DreamFusion creates its 3D images by combining a Neural Radiance Field (NeRF) -- or a neural network that can create synthetic 3D scenes using partial 2D datasets -- with a pre-trained text-to-image prompt model. Unlike Dream Fields, which utilized OpenAI's CLIP technology as that latter pre-trained model, DreamFusion now uses its own: Imagen, Google's DALL-E 2 competitor.
Google's new artificial intelligence turns text into 3D objects
Google originally unveiled its generative 3D AI system called Dream Fields in 2021, and now a new and improved version has arrived. Google's new next-generation artificial intelligence software designed to convert text into 3D generated images is called DreamFusion. So, how does this work? In a new proof-of-concept paper published to the pre-print server arXiv, researchers outlined that Dream Fusion, much like Dream Fields, uses a neural network called Neural Radiance Field (NeRF) that is designed to general novel views of complex 3D scenes using 2D datasets. However, DreamFusion has taken a different approach than Dream Fields, as explained by Google research scientist Ben Poole who wrote on Twitter that the team replaced OpenAI's CLIP technology that powered Dream Fields with Google's own AI model called Imagen. The 3D models seen above and below aren't as photo-realistic as what we've seen with Midjourney.
Understanding Pure CLIP Guidance for Voxel Grid NeRF Models
Lee, Han-Hung, Chang, Angel X.
We explore the task of text to 3D object generation using CLIP. Specifically, we use CLIP for guidance without access to any datasets, a setting we refer to as pure CLIP guidance. While prior work has adopted this setting, there is no systematic study of mechanics for preventing adversarial generations within CLIP. We illustrate how different image-based augmentations prevent the adversarial generation problem, and how the generated results are impacted. We test different CLIP model architectures and show that ensembling different models for guidance can prevent adversarial generations within bigger models and generate sharper results. Furthermore, we implement an implicit voxel grid model to show how neural networks provide an additional layer of regularization, resulting in better geometrical structure and coherency of generated objects. Compared to prior work, we achieve more coherent results with higher memory efficiency and faster training speeds.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- North America > Canada > Alberta (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Apple's new GAUDI AI turns text prompts into 3D scenes
Apple shows its latest AI system GAUDI. It can generate 3D indoor scenes and is the foundation for a new generation of generative AI based on NeRFs. So-called neural rendering brings artificial intelligence to computer graphics: AI researchers at Nvidia, for example, are showing how 3D objects are created from photos, and Google is relying on Neural Radiance Fields (NeRFs) for Immersive View or developing NeRFs for rendering people. So far, NeRFs are mainly used as a kind of neural storage medium for 3D models and 3D scenes, which can then be rendered from different camera perspectives. This is how the frequently shown camera movements through a room or around an object are created.
- Media > Television (0.43)
- Media > Photography (0.43)
- Media > Film (0.43)
Zero-Shot Text-Guided Object Generation with Dream Fields
Jain, Ajay, Mildenhall, Ben, Barron, Jonathan T., Abbeel, Pieter, Poole, Ben
We combine neural rendering with multi-modal image and text representations to synthesize diverse 3D objects solely from natural language descriptions. Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision. Due to the scarcity of diverse, captioned 3D data, prior methods only generate objects from a handful of categories, such as ShapeNet. Instead, we guide generation with image-text models pre-trained on large datasets of captioned images from the web. Our method optimizes a Neural Radiance Field from many camera views so that rendered images score highly with a target caption according to a pre-trained CLIP model. To improve fidelity and visual quality, we introduce simple geometric priors, including sparsity-inducing transmittance regularization, scene bounds, and new MLP architectures. In experiments, Dream Fields produce realistic, multi-view consistent object geometry and color from a variety of natural language captions.
- North America > United States > Oklahoma > Beaver County (0.04)
- Asia (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)